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Abstract 

In many online learning scenarios the loss functions are not memoryless, but rather depend on 
history. Our first contribution is a complete characterization of sufficient and necessary conditions for 
learning with memory, accompanied with a novel algorithm for this framework that attains the optimal 
0(\/r)-regret. This improves previous online learning algorithms that guaranteed OiT'^^'^) regret and 
required more stringent conditions. As an application of the new technique, we address the classical 
problem in finance of constructing mean reverting portfolios. We design an efficient online learning 
algorithm for this problem, and provide guarantees for its performance. We complement our theoretical 
findings with an empirical study that verifies our theoretical results on financial data. 



1 Introduction 

In numerous online learning scenarios the environment is not completely oblivious to the decision maker, 
and the decision maker's historical actions affect her current state and future rewards. We are particularly 
concerned in scenarios in which this historic effect is relatively short-term and simple, in contrast to state- 
action models in which better tailored reinforcement learning models have been devised IIYMS09I . 

Thus, our focus is on online learning in which actions have short-term effects on future losses. This model 
was initially considered in the information theory community IIMOSW06I . with an eye on applications 
in compression, coding and portfolio selection. In this work, after describing our contributions to this 
framework, we apply this model to finding statistical arbitrage opportunities in financial market data. 

We start by studying the framework of online learning with memory. Our first result is a novel algorithm 
for this framework that attains optimal regret bounds on the order of 0{\/T), where T is the number of 
prediction iterations. The algorithm is also computationally efficient, and we show that its assumptions 
are not only sufficient, but in fact necessary for any efficient algorithm for learning with memory. These 
bounds improve IMOS W06 I, who attains a regret bound of 0(T^/^). Thus, our results show that in fact 
the optimal regret bounds for this framework are of the same order as standard memoryless learning, and 
the overhead for the more complicated model is in the memory effect only. 

Next, we proceed to study our motivating problem — constructing mean revering portfolios. In the lit- 
erature, "statistical arbitrage" refers to statistical mispricing of one or more assets based on the expected 
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value of these assets. One of the most common trading strategies, known as "pairs trading", seeks to create 
a mean reverting portfolio by combining two different assets (typically using both short and long sales). 
Then, by buying the combined portfolio below its mean and selling it above, one can have an expected 
positive profit with low risk. 

This strategy consists of three main steps: choosing the two underlying assets, then finding appropriate 
distribution of weights over them, and finally applying trading algorithms (which determine the buying and 
selling points) to maximize profit. In this work we focus on the first two steps with the following extensions: 
we allow portfolios that consist of more than two assets, and we consider an online scheme in which the 
distribution of weights can be updated (up to some extent). I.e., given a set of n different assets we wish to 
isolate a subset of k assets that has a large amount of mean reversion, and then determine a certain weight 
distribution over this subset. 

The problem of modifying the weights of a portfolio in order to maximize mean reversion is a learning 
problem with memory: the weights of previous iterations determine the price of the mixed asset, and thus 
the overall performance in terms of mean reversion. We cast this problem formally as a learning with 
memory problem, and utilize our new technique to solve it online. This yields the first sub-linear regret 
algorithm for this problem. Besides the theoretical sublinear regret bound, we test the resulting algorithm 
and demonstrate its effectiveness on financial data. 

1.1 Related work 

Statistical arbitrage and in particular pairs trading strategies initially took place in the mid 80's IIEG87I . 
Since then, a great deal of work has been done on the problem of assembling mean reverting portfo- 
lios, mostly using cointegration techniques (see MVidllll for more comprehensive information). In order 
to quantify the amount of mean reversion in various portfolios, different proxies are often suggested such 
as zero-crossing and predictability HSchlll ID'Alll . In this work, we consider a new proxy for mean re- 
version which is aimed at maximizing fluctuation, as well as keeping the mean close to zero. Furthermore, 
whereas classical cointegration techniques require training period before applying a trading strategy (see for 
instance IIAL10[|AM12II '). the online approach does not require that, in addition to providing a performance 
guarantee against the best mean reverting portfolio in hindsight. 

2 Online learning with memory 

Online learning is a game-theoretic optimization framework, where iteratively an online player chooses a 
decision xt, and suffers loss ft{xt). The loss function ft is chosen by an all-powerful adversary with full 
knowledge of our learning algorithm (see for instance IICBL06II ). 

Here we consider the following extension. At iteration t, an online player chooses a decision xt G /C, 
where K, is called the decision set. Then, an adversary chooses a loss function ft : KT^ — )■ M, and the online 
player suffers loss of ft{xt-m+i , ■ ■ ■ , xt)- Notice that the loss at iteration t depends on the previous m — 1 
decisions of the player, as well as on his current decision. 
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Our goal in this framework is to minimize the sum of losses over predefined number of iterations T. A 
reasonable benchmark is to try to be not much worse than the total loss suffered by the best decision in 
hindsight. More precisely, we define the regret as 

T T 
Rt = ^ft{xt~m+i,---,xt) -min^ft{x,...,x), (1) 

t=m, t=m 

and wish to obtain efficient algorithms, whose regret grows sub-hnearly in T, corresponding to an average 
per-round regret going to zero as T increases. Q 

We henceforth make the following assumptions. The first two assumptions are standard and necessary 
for any regret minimization algorithm to apply, even without memory. Assumption three is the only new 
assumption we make, and as we explain - it is a necessary assumption if one considers efficient algorithms. 

1. The diameter of /C is bounded, i.e., there exists D > 0, such that swpx^y^K, 11^ ~v\\ ^ D , where || • || 
refers to the £2 norm. 

2. There exists G > 0, such that 

sup \\Vft{xt-m+l, ■ ■ ■ ,Xt)\\ < G. 

It follows that ft is Lipchitz continuous with a Lipchitz constant G. 

3. Define gt{x) = ft{x, . . . , x). Then, we assume that gt{x) is convex in x, for all t. This assumption 
is essentially necessary for an efficient algorithm, since achieving sublinear regret bound for {ft}f=i 
implies also that Ylt=i 9t{x) can be minimized efficiently. 

2.1 Algorithm and analysis 

By assumption 3 the functions {gt}J=i are convex, and hence we can apply the Online Gradient Descent 
(OGD) algorithm of iZin03ll with some modifications. 

Algorithm 1 OGD with Memory 

1: Input: Learning rate ry. 

2: Choose xi ^ JC arbitrarily. 

3: for t = 1 to T do 

4: Play xt and suffer loss ft{xt-m+i , • • • , xt). 
5: Set xt+i ^ Uic (^xt - r]\/gt{xt)^ 

6: end for 



Here, 11^: is the Euclidean projection onto /C, i.e. 

U-lciy) = argmin \\y — x\\. 

'The iterations in which t < m are ignored since we assume that the loss per iteration is bounded by a constant, this adds at 
most a constant to the final regret bound. 
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For Algorithm [T] we can prove the following bound. 

Theorem 2.1. Let G and D be as defined in Section^ and set ry = -^^=. Then, Algorithm\l\achieves the 
following regret bound for {/f}^]^; 

T T 

i?T = ^ ft{xt-m+i, ...,xt)- min ^ ft{x, . . . , x) < 2 • GDVmT. (2) 

t=m t=m 

Proof. By applying algorithm [U for the loss functions {gt}t=i we have that 

gt{xt) - mm ^ gt{x) < ^ + , 

t=m t=m 

using the analysis of IIZin03l . and from the definitions of ft and gt it follows that 

^ ft{xt, ...,xt)- mm ^ ft{x, . . . , x) < — H (3) 

t=m t=m 

On the other hand, since ft is Lipshitz continuous for the Lipshitz constant G we have: 

\ft{xt,.. . ,Xt) - ftixt-rn+l,- ■ ■,Xt)\'^ 

m—1 

< {G- \\{xt, ...,Xt)- {Xt^rn+l,-- ■,Xt)\\f = ■ ^ \\xt - Xt-jf 

i=i 

m—l j m~l j 

j=l 1=1 j=l 1=1 

m—l j m—l j 

< G2•EE^'^' = G"•EE^'^"^VG^ 

j=i 1=1 j=i 1=1 



which implies that 

\ft{xt, ...,xt)- ftixt-m+i,- ■ ■,xt)\ < mr]G^. 



Summing the above for t = m, . . . , T we get: 

T T 
^/t(xt,...,xt) - ^ft{xt — m+1 J ■ ■ ■ 1 Xf ) 

t=m t=m 

Now, by combining Equations ^ and @ we get the following inequality 

D2 j^G'T 



< mr]G^ = mr]G^T. (4) 



t=m 



X ^ 27] ^ 2 



^ /t(2;t-m+i, ■■■ ,xt) - min^ . . . ,x) < — H VmrjG'^T. 



t=m t=m 
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Finally, substituting r/ = ^ yields 



T 



i?T = ^ h{xt-m+\, ■■■,xt) - min ^ ft{x, . . . , x) < 2 • GDVmT, 

t=m t=m 

which completes the proof. □ 

3 Application to finance 

In this section we use the technique just developed to find and exploit statistical arbitrage opportunities. 
Roughly speaking, the goal is to synthetically create a mean reverting portfolio, exploiting correlation 
between similar assets. That is, we are seeking a strategy that maintains weights upon predefined set of 
assets, such that the combined portfolio is mean reverting. 

As a first step we define a criterion for measuring mean reversion, that is empirically well behaving. Un- 
fortunately, this criterion is not convex (as are most of other previously considered criteria), and we define 
a semi-definite relaxation to cope with the problem. Another difficulty comes from the very nature of the 
problem: weights of one iteration affect future performance, thus memory comes unavoidably into the 
picture. 

We proceed to formally define the new mean reversion criterion, its semi-definite relaxation, and the use of 
our new memory-learning algorithm in this model. 



3.1 Problem definition 

Tendency to return to the mean is not a quantifiable criterion for mean reversion, and the literature addresses 
several proxies to capture the notion of mean reversion, e.g., in rSchlli lD'Alli In this work, we present 
a new criterion for mean reversion effectiveness: low squared mean and high variance. More precisely, we 
denote by yt S M" the prices of n assets at time t, and by G M" a distribution of weights over these 
assets (we allow short selling, thus xt can contain negative entries). 

Since short selling is allowed, the norm of xt can sum up to an arbitrary number, determined by the loan 
flexibility of the back. Thus we assume without loss of generality that Hx^ ||2 = 1, and define the following 
loss function: 

9 

(m—1 \ m—1 

^ xf_iyt-i - A • ^ {xf_iyt-i) 
i=o / i=0 

for some A > 0. Notice that minimizing this loss function iteratively yields a process {xfyt}f:^i ^^^^ "^^at 
its mean is close to 0, while its variance is maximized. Variance maximization is crucial here, since high 
variance processes tend to create larger amount of statistical arbitrage opportunities. 
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We use the regret criterion to measure our performance against the best distribution of weights in hindsight, 
and wish to obtain online algorithm that generates a series {xt}f^i such that 

T T 

ft{xt-m-i, ■■■ ,xt) - mill ^ ft{x, ... ,x) = o{T). 

t=m t=m 

The previous distributions of our weights affect the mean reversion amount of our portfolio, and hence ft is 
a loss function with memory. As in Section|2j we define gt{x) = ft{x, . . . ,x), and show that by obtaining 
regret bound for {gt}f^i we also guarantee a regret bound for {ft}f^i. 

Notice that gf is of the form 

gt{x) = x'^Atx - x'^BfX (5) 

for 

m—l m—1 

and 

(m-l \ 

The function gt is not convex in general, and hence we cannot apply the technique detailed in Section |2] 
straightforwardly. Instead, we define 

ht{X)=XoAt-XoBt, (6) 

where 

n n 
i=l 3=1 

and X is a PSD matrix with Tr{X) = 1. 

Now, the problem of minimizing Ylt=m ht{X) is a PSD relaxation to the problem of minimizing Ylit=m 9t{x) 
and for the optimal solution 

T 

X* = arg min > gt ix) , 

t=m 

it holds in particular that 

T T T 

min ^ ht{X) < J2 ht{x*x*'^) = ^ gt{x*). 



X 

t=m t=m t=m 



Notice that ht is linear in X for all t, and hence we can apply regret minimization techniques on the loss 
functions {ht}f^i. 
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3.2 Parameter setting 

Throughout this section we use the following parameters and notations: 

1. The decision set K, is defined as: 

K. = {X\K is PSD and Tr {X) = 1}. 

From this definition we can bound the diameter of K, by svipx YeK ~ ^Wf D = \/2 , when 
II • \\p refers to the Frobenius norm. 

2. There exists G > 0, such that 

sup W's/ftixt^m+l, ■ ■ ■ ,Xt)\\ < G. 

{xt-m+l,...,Xt),t 

It follows that ft is Lipshitz continuous for the Lipshitz constant G, and also that 

snp \\Vht{Xt)\\F < G. 

Xt,t 

Clearly, the value of G depends on the prices of assets we are considering, and its computation is 
done accordingly. 

3.3 Algorithm and analysis 

We turn now to present our online algorithm. 

Algorithm 2 Onhne Statistical Arbitrage (OSA) 

1: Input: Learning rate t], Xq = ^Inxn- 

2: Randomize xq ~ ^o- 

3: for t = 1 to (r - 1) do 
4: Set Xt ^ Ulc - v^htiXt)) 

5: Set Xt = xt-i w.p. (l - 
6: Otherwise, randomize xt ~ X^. 
7: Play Xt and suffer loss ftixt-m+i , • • • , xt). 

8: end for 



Here, IIa: refers to the following projection onto /C: 

Iljc{X) = arg min ||X — ^^||f- 

Also, Xt ~ Xt refers to the eigenvector decomposition of the matrix Xt. I.e, we represent Xt = Yl^=i -^i^ivj , 
where each Vi is a unit vector and Y^^=i ■^i = 1> when Aj > 0. Then, we randomize the eigenvector xt = vi 
with probability Aj. Technically, this decomposition is possible due to the fact that Xt is positive semi- 
definite with Tr{Xt) = 1 for all t. 

For Algorithmic we can prove the following bound. 
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Theorem 3.1. Setr] = -^j^^fwji- Then, Algorithm^achieves the following regret bound for {ft}J=i-' 

T T 

Rt = J2^ [Mxt-m+u xt)] - mm ft{x, . . . , x) < 3 • V^GDT^I^. 



X 

t=m t=m 



Proof. By applying Algorithm |2] for the loss functions {ht}J^i we get a series of matrices {Xt]J^i, such 
that 



using the analysis of MZinOBII . From the definitions of gt and ht it exists that 

T 

^1 V /iffX) < min 



mill ^ ht{X) < min ^ gt{x), 



t=m t=m 

and hence 



^ /it(Xt) - mm J2 9t{x) < ^ + ^ E 

t=m t=m ' t=m 

Now, from Lemma 1X2] we know that 



J2htiXt)-E[gt{xt)] 



t=m 



which yields 

E [gt{xt)] - mm 9t{x) < ^ + ^ E ^ + V^GDT^'\ 

t=m t=m ' t=m 

From the definition of gt it follows that 



Y E [ftixu • • • , xt)] - mm ^ /^(x, ...,x)<:^ + ^^r/ + V^GDT^/^ (7) 

t=m t=m ' t=l 

Next, we bound the distance between xt and xt^i in expectation for all t. Unlike presented in Section|2l we 
cannot rely on the closeness of xt and xt-i that follows from the step size of the onhne update. However, 
we can use the fact that xt / xt-i with probability — and therefore 
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Now, similarly to Section |2] we rely on the fact that ft is Lipchitz continuous with a Lipchitz constant G, 
and thus 

\K[ft{xt,...,xt)]-E[ft{xt — m+l ; • • • ) -^t /J I 

< E [\ft{xt,.. .,Xt)- ft{xt-m+l,-- ■,Xt)\f < {G-E[\\{xt,.. .,Xt)- (Xt-m+l,- ■ ■ , Xt)\\]f 

m—1 m—l j m—l j 

< G'.Y.E[\\xt- xt.,f] < • E [lk.-m - -t-if] < • E E ^ 

j=i j=i 1=1 j=i 1=1 "^vi 



< 



mG^D"^ 



and it follows that 



\E[ft{xt, . . .,Xt)] - E [ft{xt-m+l, ■ ■ ■,Xt)] I < 



mGD 



rpl/4 ■ 



Summing the above for all t yields 

T 



J2^[ft{xt,...,xt)]-Y,E[ft{xt — m+l 1 ■ ■ ■ 1 Xt) 



t=m 



t=m 



< ^GDT^/^, 



and by combining Q and ^ we get that 



^ E [ft{xt, . . . , Xf )] - mm ^ ft{. 



X, . . . , X , 



t=m 



t=m 



<^ + ^Y.^ + 2-V^GDT'/\ 
2rj 2 ^ 



Finally, substituting rf = ^Qrp-i/i yields 



(8) 



Rt = Y,^ [ft{xt-ra+i, • • • , xt)] - mm ^ ft{x, . . . , x) < 3 • V^GDT^/\ 



t=m 



t=m 



which completes the proof. 



□ 



We now turn to prove Lemma 

Lemma 3.2. Let gt and ht be as denoted in Equations and © and Q. Then, Algorithm\2\generates online 
sequences {Xt}J^i and {x^}^^ such that 



Y,ht{Xt)-n9t{xt)] 



t=m 



< V^GDT^/'^. 



Proof. Notice that gt{xt) = ht [xtxj) from the definitions of gt and ht. We show by induction that 



Xt-E 



XtXt 



F - T^l'^ 
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and from the Lipchitz property of {ht}J^i this also implies that 



< 



mGD 



Thus, for i = we have that 



Xn-E 



XqXq 



rpl/4 



since Xq. For f = 1 we have that 

Xx-E 



x\x\ 







1 



+ 



^1 -E 



m\/r 
X\ — Xq + Xq — E 



< \\Xi-Xo\\f- [1 



XqXq 



XqXq 



1 



= < , 

F ~ rV4 



1 



1 

m\fT 



my/T 



+ 



Xq-E 



XqX^ 



m 



my 



D 



D 



< 



mD 



Next, we assume that 



and prove that 



Xt-E 



XtXf 



F ~ rV4 



t+1 



E 



F ~ T^/^ 



Thus, 



Xt+i - E 



= 



+ 



Xt+i - E 



Xt+i -Xt + Xt-E 



1 



xtx] 
1 



1 - 



mD D 



myjT 
1 



+ 



Xf-E 



D 



XtX^ 



T 



1 



+ 



F 

mD 



m\/T 



< 



• 1 



m 



Vt 



77^3/22^5/4 - 2^1/4 ' 

which completes the induction. Now, from the Lipchitz property of {ht}J:^i this implies that 

^GD 



ht{Xt)-E[ht (xtxj^ 



< 



rpl/4 ' 
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and summing the above for all t yields 



T 



t=m 



□ 



4 Experiments 

The following experiments demonstrate the effectiveness of the proposed algorithms, under two different 
settings. In the first setting, we consider the Dow Jones Index (30 stocks) and apply our criterion to isolate 
subsets of stocks that have a maximal amount of mean reversion. In the second setting, we consider pairs 
of sector related stocks, and apply the proposed algorithm to construct mean reverting portfolios ("pairs 
trading"). 

4.1 Comparison method 

We compare the amount of mean reversion of the different portfolios using the Portmanteau test from 
IILB78 1. This test is aimed at determining whether a process is close to be pure noise process, and hence 
can be effectively applied in our case as a measure for mean reversion. More accurately, we define At = 
xjut — xf__iyt-i to be the daily change of our portfolio, and consequently 



is the sample autocorrelation at lag k, and L is the number of autocorrelation lags (chosen to be 20 by 
default). Under the null hypothesis, the asymptotic distribution of Qm is chi-square with L degrees of 
freedom, and therefore we can use the p- value as our measure for mean reversion. 

Additionally, we compare the revenue obtained by applying the following trading strategy to each of the 
portfolios: buy whenever it reaches a certain lower threshold, and sell whenever it reaches an upper thresh- 
old (we assume no transaction cost). We arbitrarily use (—1) and (+1) as lower and upper thresholds in our 
experiments, but similar results can be shown for any other choice. It is highly likely that dynamic trading 
strategies such as those presented in IIGGR991 IJY06II would yield higher revenue. However, the design of 
a trading strategy is completely orthogonal to our work, and our goal here is simply to compare various 
portfolios with unified trading strategy. 




to be the Portmanteau statistic, where 



p{k) 



EL A? 
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4.2 Data set 



For the first setting, we consider time series of daily closing rates of all 30 stocks in the Dow Jones. For the 
second setting, we consider time series of daily closing rates of 8 pairs of stocks. The selection of the pairs 
relies on their sectoral belonging (Financials, Energy, Telecommunication services, etc.). In both settings, 
we use data between the dates 01/01/2008 and 01/02/2013, which is taken from Yahoo! Finance 



4.3 Isolating mean reverting portfolio 

In this setting, we test our criterion on the Dow Jones Index (30 stocks) and isolate subsets of stocks that 
have large amount of mean reversion. This can be done by setting a certain threshold, and including only 
those stocks with corresponding weights above this threshold in our portfolio. We compare the perfor- 
mance of our criterion for various values of A (recall that higher value of A corresponds to more fluctuating 
portfolios). 




Figure 1 : Three mean reverting portfolios, based on the Dow Jones index, each assembled by executing the 
proposed algorithm with certain A parameter. 





p-value 


Index 


A = 6 


A = 3 


A = 


Dow Jones 


6.03 • 10"^ 


6.47 • 10^^ 


6.89 • IQ-^ 



Table 1 : p- values for the Portmanteau test of three mean reverting portfolios based on the Dow Jones index 
(in aU cases m = 10). 
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This setting is aimed at testing the effect of the A parameter in the proposed criterion. In Figure [Tjone 
can clearly see that the proposed algorithm generates distributions that create mean reverting portfolios, 
regardless of the value of A. The significance of the Portmanteau test can be clearly seen in Table[T]for all 
three values. The supplementary material contains detailed information regarding the proposed portfolios 
for each of the values of A. 



4.4 Pairs trading 

In this setting, we compare the performance of the proposed algorithm (OSA) to the trading strategy of 
distributing the weight proportionally to the price of the stocks (this strategy is referred to as "Benchmark"). 
I.e., assume that the average prices (over certain period of time) of stocks A and B are $10 and $20. Then, 
we would sell two shares of A against each share of B we buy, or vise versa. We also compare the 
performance to the offline optimal distribution of weights (this strategy is referred to as "Off-opt"), which 
follows from our criterion: 



T 



m—1 



arg mm 



EE 



T 

X yt- 



t=m 



In all runs, we use the parameters A = 2 and m = 5, which were chosen arbitrarily. 

In Tables[2]and[3]one can clearly see the advantage of the proposed algorithm (OSA) over the offline bench- 
marks, in the compared parameters — revenue and closeness to pure noise. In Figure 2(b) we demonstrate 
the performance of the proposed algorithm visually, by applying it on the pair AT&T and Verizon (Telecom- 
munication services). 





p-value 


Pair 


Benchmark 


Off-opt 


OSA 


MSFT&INTC 


5.13 • 10-4 


5.1 • 10-4 


2 • 10-4 


KO & PEP 


0.2906 


0.2786 


0.1746 


T& VZ 











MMM & DD 


0.5694 


0.5619 


0.5896 


PFE & MRK 


0.1984 


0.1973 


0.3247 


JNJ & PG 


0.2484 


0.2471 


0.1687 


XOM & CVX 


0.2374 


0.2671 


0.4083 


HD & WMT 


0.5934 


0.5679 


0.5376 



Table 2: p-values for the Portmanteau test (smaller is better). The results are averaged over 50 runs for 
stability. 
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50 



Verizon 

AT&T 




200 400 600 800 1000 1200 1400 

Time 



(a) Historical closing rates. 




200 400 600 800 1000 1200 1400 

Time 



(b) Value of mean reverting portfolio. 
Figure 2: Experimental results for T and VZ. 
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Revenue (USD) 


X till 


XJ CiXClllXlctX IV 


Off-nnt 




MSFT &. TNTC 


18 


18 


17.52 


KO & PEP 


2 


2 


14.28 


T& VZ 


14 


14 


22.28 


MMM & DD 


8 


8 


28.76 


PFE & MRK 


10 


10 


12.96 


JNJ&PG 


32 


32 


29.88 


XOM & CVX 


2 


6 


32.24 


HD & WMT 


10 


10 


23.04 



Table 3: Revenues in the pairs trading setting. The results are averaged over 50 runs for stabiUty. 

5 Conclusion 

Motivated by financial applications, we have considered the setting of online learning with memory, and 
gave efficient and asymptotically-optimal regret algorithms for this general setting. Application to con- 
structing mean-reverting instruments is explored theoretically and empirically. 

The following research directions remain: First, whereas the proposed algorithm for the online learning 
with memory framework is optimal in the number of iterations T, the optimal dependence on the memory 
parameter m remains unknown. We conjecture that the regret bound of @{VmT) is tight. Second, it would 
be interesting to explore the optimal trading strategy in conjunction to a mean reverting portfoUo. 
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